WASPBENCH: a lexicographer's workbench incorporating state-of-the-art word sense disambiguation

نویسندگان

  • Adam Kilgarriff
  • Roger Evans
  • Rob Koeling
  • David Tugwell
چکیده

Human Language Technologies (HLT) need dictionaries, to tell them what words mean and how they behave. People making dictionaries (lexicographers) need HLT, to help them identify how words behave so they can make better dictionaries. Thus a potential for synergy exists across the range of lexical data in the construction of headword lists, for spelling correction, phonetics, morphology and syntax, but nowhere more than for semantics, and in particular the vexed question of how a word's meaning should be analysed into distinct senses. HLT needs all the help it can get from dictionaries, because it is a very hard problem to identify which meaning of a word applies. Lexicographers need all the help they can get because the analysis of meaning is the second hardest part of their job (Kilgarriff, 1998), it occupies a large share of their working hours, and it is one where, currently, they have very little to go on beyond intuition and other dictionaries. Thus HLT system developers and corpus lexicographers can both benefit from a tool for finding and organizing the distinctive patterns of use of words in texts. Such a tool would be an asset for both language research and lexicon development, particularly for lexicons for Machine Translation. We have developed the WAS PB EN CH, a tool that (1) presents a "word sketch", a summary of the corpus evidence for a word, to the lexicographer; (2) supports the lexicographer in analysing the word into its distinct meanings and (3) uses the lexicographer's analysis as the input to a stateof-the-art word sense disambiguation (WSD) algorithm, the output of which is a "word expert" which can then disambiguate new instances of the word.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of a Lexicographer's Workbench Incorporating Word Sense Disambiguation

NLP system developers and corpus lexicographers would both bene t from a tool for nding and organizing the distinctive patterns of use of words in texts Such a tool would be an asset for both language research and lexicon development particularly for lexicons for Machine Translation We have developed the waspbench a tool that presents a word sketch a summary of the corpus evidence for a word to...

متن کامل

An Evaluation of a Lexicographer's Workbench: building lexicons For Machine Translation

NLP system developers and corpus lexicographers would both benefit from a tool for finding and organizing the distinctive patterns of use of words in texts. Such a tool would be an asset for both language research and lexicon development, particularly for lexicons for Machine Translation (MT). We have developed the WASPBENCH, a tool that (1) presents a "word sketch", a summary of the corpus evi...

متن کامل

WASP-Bench: an MT Lexicographers' Workstation Supporting State-of-the-art Lexical Disambiguation

Most MT lexicography is devoted to developing rules of the kind, “in context C, translate source-language word S as target-language word T”. Very many such rules are required, producing them is laborious, and MT companies standardly spend large sums on it. We present the WASP-Bench, a lexicographer's workstation for the rapid and semi-automatic development of such rule-sets. The WASPBench makes...

متن کامل

WASPBENCH: a lexicographer’s workbench supporting state-of-the-art word sense disambiguation

Human Language Technologies (HLT) need dictionaries, to tell them what words mean and how they behave. People making dictionaries (lexicographers) need HLT, to help them identify how words behave so they can make better dictionaries. Thus a potential for synergy exists across the range of lexical data in the construction of headword lists, for spelling correction, phonetics, morphology and synt...

متن کامل

Word Relatives in Context for Word Sense Disambiguation

The current situation for Word Sense Disambiguation (WSD) is somewhat stuck due to lack of training data. We present in this paper a novel disambiguation algorithm that improves previous systems based on acquisition of examples by incorporating local context information. With a basic configuration, our method is able to obtain state-of-the-art performance. We complemented this work by evaluatin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003